Closed Itemset Mining and Non-redundant Association Rule Mining

نویسنده

  • Mohammed J. Zaki
چکیده

DEFINITION Let I be a set of binary-valued attributes, called items. A set X ⊆ I is called an itemset. A transaction database D is a multiset of itemsets, where each itemset, called a transaction, has a unique identifier, called a tid. The support of an itemset X in a dataset D, denoted sup(X), is the fraction of transactions in D where X appears as a subset. X is said to be a frequent itemset in D if sup(X) ≥ minsup, where minsup is a user defined minimum support threshold. An (frequent) itemset is called closed if it has no (frequent) superset having the same support. An association rule is an expression A ⇒ B, where A and B are itemsets, and A ∩ B = ∅. The support of the rule is the joint probability of a transaction containing both A and B, given as sup(A ⇒ B) = P (A ∧ B) = sup(A ∪ B). The confidence of a rule is the conditional probability that a transaction contains B, given that it contains A, given as: conf(A ⇒ B) = P (B|A) = P (A∧B) P (A) = sup(A∪B) sup(A) . A rule is frequent if the itemset A∪B is frequent. A rule is confident if conf ≥ minconf , where minconf is a user-specified minimum threshold. The aim of non-redundant association rule mining is to generate a rule basis, a small, non-redundant set of rules, from which all other association rules can be derived.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Non- Redundant Frequent Pattern in Taxonomy Datasets using Concept Lattices

In general frequent itemsets are generated from large data sets by applying various association rule mining algorithms, these produce many redundant frequent itemsets. In this paper we proposed a new framework for Non-redundant frequent itemset generation using closed frequent itemsets without lose of information on Taxonomy Datasets using concept lattices. General Terms Frequent Pattern, Assoc...

متن کامل

Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions

The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...

متن کامل

Mining Closed Itemsets: A Review

Closed itemset mining is a popular research in data mining. It was proposed to avoid a large number of redundant itemsets in frequent itemset mining. Various algorithms were proposed with efficient strategies to generate closed itemsets. This paper aims to study the existence algorithms used to mine closed itemsets. The various strategies in the algorithms are presented and analyzed in this paper.

متن کامل

ZART: A Multifunctional Itemset Mining Algorithm

In this paper, we present platform Coron, which is a domain independent, multi-purposed data mining platform, incorporating a rich collection of data mining algorithms. One of these algorithms is a multifunctional itemset mining algorithm called Zart, which is based on the Pascal algorithm, with some additional features. In particular, Zart is able to perform the following, usually independent,...

متن کامل

A lattice-based approach for mining most generalization association rules

Traditional association rules consist of some redundant information. Some variants based on support and confidence measures such as non-redundant rules and minimal non-redundant rules were thus proposed to reduce the redundant information. In the past, we proposed most generalization association rules (MGARs), which were more compact than (minimal) non-redundant rules in that they considered th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009